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The classical product state capacity of a noisy quantum channel with memory is investigated. 
A forgetful noise-memory channel is constructed by Markov switching between two depolarizing 
channels which introduces non-Markovian noise correlations between successive channel uses. The 
computation of the capacity is reduced to an entropy computation for a function of a Markov 
process. A reformulation in terms of algebraic measures then enables its calculation. The effects 
of the hidden-Markovian memory on the capacity are explored. An increase in noise-correlations is 
found to increase the capacity. 

PACS numbers: 03.67.Hk, 89.70.Kn, 02.60.Cb 



I. INTRODUCTION 

Quantum mechanics brings strange and wonderful fea- 
tures to the field of information theory. It introduces 
new information resources such as qubits with the power 
of superposition but also teasing restrictions such as the 
no-cloning theorem. We are interested in the possibil- 
ity of the boosted transmission of classical information 
through a quantum channel with memory and no prior 
entanglement. 

Great strides have been made in understanding the 
capacity of quantum channels. For example, the cele- 
brated Holevo-Schumacher- Westmoreland (HSW) theo- 
rem [l[ gives an expression for the classical capacity of a 
noisy memoryless quantum channel with product state 
inputs. The memoryless channel restriction has since 
been extended to, so called, forgetful memory channels 
Q. The inclusion of memory is the next step in the 
attempt of accurately modelling the complicated noise- 
correlated real world. Now that these initial seeds of the 
theoretical framework are in place, it is enlightening to 
use these tools, in specific cases, to analytically study the 
new effects that noise with memory has on the capacity. 

We construct a forgetful channel and incorporate 
memory effects by Markov switching between two sub- 
channels. In order to investigate the classical product 
state capacity of this channel we must look at the en- 
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tropy of the classical output. The output sequence of 
qubits and their associated errors are correlated. To 
manage this complicated conditional dependence, we use 
the hidden Markov nature of the process to reformulate 
the problem using the algebraic measure construction [i[ . 
The algebraic measure approach allows us to derive an 
expression for the asymptotic entropy rate. We then ex- 
plore the effects that our non-Markovian memory has on 
the classical product state capacity. 

This paper is structured as follows. In Section HH we 
take a closer look at the quantity we are investigating, 
namely the product state classical capacity. In Section 
IIII1 we construct the forgetful channel with Markovian 
noise correlations. In Section ITVl algebraic measures are 
introduced, which are used in Section [V] to reformulate 
the problem. Finally, in Section IVI1 we show how this 
allows us to easily calculate the capacity of the channel 
numerically. 



II. CLASSICAL CAPACITY OF QUANTUM 
CHANNELS 



The information process we are studying is classical 
communication through a noisy quantum channel. The 
layout of this section largely follows that in 

With the classical information we want to send en- 
coded using an input alphabet A = {1, . . . , a}, we choose 
for every element i S A an encoding quantum state 
on a Hilbert space fj. This input state is then trans- 
mitted using a quantum channel A : B(Sj) — > B(&). For 
the channel to be a valid quantum channel it must be a 
completely positive trace preserving map. 

Transmitting the element i 6 4 results in a quantum 
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state Ri — A(pi) being received on the output side. On 
this side, the received quantum state is measured using 
a resolution of identity in This resolution of identity 
is a set of positive operators X — {Xi} on & such that 

The conditional probability of the receiver measuring 
j, when the input i was sent, is given by p(J\i) = Tr RiXj. 
If at the input side the element i is sent with a probability 
7Tj, the amount of information that will be received is 
quantified by the classical Shannon information, 



E 

i,jeA 



TtP(i|i)log 



P{j\i) 



(1) 

If the sender is allowed to use the channel n times, 
the channel use can be described by the product channel 
A„ = ® n K on (g> n F) =S)®...®S). The input alphabet is 
now A n and the probability distribution of a word u — 
(il, . . . ,i n ) £ A n being sent is again denoted by n u . The 
codeword corresponding to the input u is given by 

Pu = Ph <8> • ■ ■ <£> Pi n 

and results in R u = Ri 1 ® . . . <g> Ri n being received. The 
conditional probability and the Shannon information 
for the n-product of the channel can now be introduced 
completely analogously to Eq. (TTJ, with the summations 
over A n instead of A. 

The maximum amount of information that can be sent 
with n channel uses is now given by 

C„(A) = sup I A .n{iT,p,X) . 

TT.p.X 



Due to the fact that C n 



C m < C m+n , the limit 
C„(A) 



C c i aS s(A) = lim 

n — >oo ii 

exists. Using Shannon's coding theorem, we see that C 
is the least upper bound of the rate of information that 
can be transmitted with asymptotically vanishing error. 

The HSW theorem [l[ gives an expression for this clas- 
sical product state capacity of noisy memoryless quantum 
channels, 



Cclass(A) = X* 



supx(A), 

7T,p 



where % is the Holevo \ quantity 

X><S(A(*)) 



S£>A(*)) 



Due to the convexity of the von Neumann entropy, the 
supremum can in fact be taken over pure states pi. 

The memoryless channel restriction has recently been 
weakened to include, so called, forgetful memory chan- 
nels. For such channels, the classical product state ca- 
pacity has been shown [3J] to correspond to 



C* = lim 



Cclass ( An ) 



where A„ is a channel representing the transmission of n 
states, with the noise on subsequent transmissions is cor- 
related. See H for details or Section [TTT] for an example. 



III. THE DEPOLARIZING MEMORY 
CHANNEL 

Treating information or noise sources as independent 
random variables is a successful but crude first approxi- 
mation. To improve the modelling process and to achieve 
better performance in real world applications, the inde- 
pendence assumption needs to be removed. The first step 
in this direction is to introduce forgetful noise memory. A 
forgetful noise process is one which after sufficiently long 
time, 'forgets' or is independent of previous noise. Thus, 
here the independence is pushed further away, allowing 
a space to study the effects of short-term memory. With 
the theoretical tools in place, it is instructive to study 
even very simple models to see the effects of memory on 
the classical capacity. 



A. Construction of the Channel 

The forgetful channel is constructed by combining two 
memoryless single qubit depolarizing channels (£o and 
£±), switching between them using a two-state Markov 
chain (Q = i,j <E {0,1}). Thus, Q is the 2 x 2 

Markovian channel selection matrix with qij being the 
probability of switching from channel i to channel j. 
Hence, > and qio + qn = 1 for i,j 6 {0,1}. It 
is forgetful, in the case when the Markov chain is aperi- 
odic and irreducible. 

The depolarizing channels can be written as: £i(p) = 
x°p + xj(l — p). These single qubit channels can be 
thought of as probabilistically mixing the identity chan- 
nel (with probability xf ) and 'flip' channel (with proba- 
bility x\ — 1 — x®) acting on a single qubit density opera- 
tor p. However this rewriting is only completely positive 
for 1/3 < < 1. 

The built-up channel A„, corresponding to n successive 
uses is 



A„ = pi <g> . . . <g> p n >-> 



1 £i n (Pn) ■ 



(2) 



The sum is over all possible paths (i 1 , . . . ,i n ) £ {0, 1}" 
and each term is a tensor product of the selected sub- 
channels weighted by the probability of occurrence (7^ is 
the initial probability of selection set to the stationary 
distribution of the Markov process: Q T j = 7). 



B. Classical Capacity 

We calculate the capacity with this n-use form of the 
channel and regularize by taking the limit n — > 00 as 
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in Eq. |J5J|. Since we are looking at the product state 
capacity, we choose 

Pi = $<">(!) = & n \l U ...,l n ) 

:=Mh\ |Q(Z„ | , 

where the Zj are arbitrary pure qubit states. 
Applying the channel A„, we get 

K{^ [n) '(0) = 51 Til?ilfa • --ftn-lin 

ii ,...,i n 

(x° h \h © OXZi © o| + ^ I Zi © ixii © i|) ® . . . 

©0| +ac,- |/„ © l\ln © 1|) , 

where (^©1) denotes the qubit state with a flipped Bloch 
vector with respect to k — (k © 0) 

|Zi©lX/i©l| = l-\k®0Xk®0\ 

By expanding the product above we see that the eigen- 
values of the output state are given by 

K(k) = E Tii&iia ■ ■ ■ Qin-xin x t ■ ■ ■ x i™ • ( 3 ) 

Note that these eigenvalues are independent of the choice 
of the input state. 

The channel output can now be written as 

A„ A„ (7 + . 

k 

Hence, if we calculate the first term in the Holevo \ quan- 
tity for 7r, the uniform distribution (7T, = 1/2"), and <E>i 
going over all the p^ n '(l), we see that 

k I 

Since 7 goes over all possible combinations, so does l + k, 
so we can relabel them 

Since the eigenvalues in Eq. ((3]) sum to one, we see that 
<I> ou t is the maximally mixed state 

i' 

Thus, 5 f ($ out ) is maximal and is equal to log 2 (2") = n. 
The second term in the Holevo x quantity is 



Since the eigenvalues X n (k) of A„(p,) do not depend on 
the choice of pi, this term does not influence the max- 
imization. Hence our choice of tt and p maximizes the 
Holevo x quantity. 

Thus, the final expression for the regularized capacity 
Eq. © is 

C* = lim -C class (A„)) = 1 - lim - S(A n (p)) . (4) 

n — >oc Ti n — >oo ri 

If we were to calculate the output entropy using the 
eigenvalues in Eq. ((3]), the calculation would be exponen- 
tially long in n. Therefore, other techniques are needed. 
The way we approach the problem is by reformulating 
it as a hidden Markov process. The eigenvalues of the 
output state correspond to the probabilities of such a 
process. 

A hidden Markov process can be defined as follows. 
If we have a translation-invariant measure v with the 
Markov property on L z , where L is a finite set, then 
a hidden Markov measure can be constructed on K z 
through a function $ : L — > K, with the following lo- 
cal densities 

p((uJ m , ■ ■ ■ ,0J n )) = E K( £ m, ■■■,£«)) , 

e m ,. . . ,e n 

( e m ) — uj m . . . <& ( e „ ) — u) n 

(5) 

where uj m , . . . , ui n S K and e m , . . . , e n G L . For obvi- 
ous reasons, these processes are also called functions of 
Markov processes. 



IV. ALGEBRAIC MEASURES 

An algebraic measure, p, is a translational-invariant 
measure on a set {0, . . . , q — 1} Z , with probabilities deter- 
mined by matrices E a with positive entries, one for each 
of the q states. The probability of a sequence is obtained 
by applying a positive linear functional a to a matrix 
product of the corresponding matrices of the states of the 
sequence: p(i\, ■ ■ ■ , i n ) — &{Ei 1 . . . Ei n ). This matrix al- 
gebraic construction is the reason for the name Algebraic 
Measure, studied in detail in Ref. 0. As we shall see, 
the hidden Markov processes correspond to a set of al- 
gebraic measures with a specific positivity structure and 
remarkably, the converse holds too. 



A. Manifestly Positive Measures 

In [| it was shown that hidden Markov processes cor- 
respond to manifestly positive algebraic measures. The 
local densities of such a manifestly positive algebraic 
measure on an infinite chain K z of classical state spaces 
K = {0, . . . , q — 1} are of the form 

jLt((wi, . . ,,w„)) = {r\E Ul . ..E Un a) , 
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where u>i G K, r and a are vectors in R d with non- 
negative elements (denoted and the Ei are d x d 
real matrices with non-negative elements (denoted Mf). 

As an example of these manifestly algebraic measures, 
let us look at a regular Markov chain /x((w m , . . . , u> n )) on 
{0, . . . , q — 1} Z . If we choose r, a and the Ei as 

a G (R d ) + : a a = 1 for a G if , 

r G (JR d ) + : r a = //((a)) for a £ K , 

M ((6,c)) 



where 



E a eM+: {E a \ c 



J a ,b- 



for a,b,c E K . 



£7 Wn cr) indeed gives the cor- 



one can check that (r|i? Wi 
rect densities. 

From this example it is easy to see that if we have a 
hidden Markov process on L z defined by a map $ : K — > 
L and a Markov measure /i on K with corresponding 
matrices E a , the manifestly positive algebraic measure 
corresponding to the hidden Markov measure is given by 
the same vectors a and r as before and the following 
matrices: 



F a G M + : F a = ^ E e for a E K . 



(6) 



€,<l>(e)=a 



For a proof of the converse, which is namely, that every 
manifestly positive algebraic measure corresponds to a 
hidden Markov measure, we refer to |4|. 



B. Mean Entropy 

We will now briefly summarize how the algebraic mea- 
sure approach allows for a simpler approach to finding 
the entropy density [3, @ • 

The entropy of a state [i on if 2 restricted to a region 
A is defined by 

S\(n) = - ^ m(wa) log^(w A ) • 

Sa can be shown to be bounded by $Alogg, mono- 
tonically increasing in A and strongly subadditive, that 
is 

S\ x nA 2 (p) + Sa iL1 a 2 (p) < S\ 1 (n) + Sa 2 (a*) • 

Using the strong subadditivity of the entropy and the 
translational invariance of the measure, one can show 
that [11 



S(/i) = lim 



S(// n ) 



lim SQin) ~ S(/i„-i) 



We can then use this relation together with the ex- 
pression for the local densities of the manifestly positive 
measures to reformulate the convergence of the mean en- 
tropy into a dynamical system of converging measures on 
the set of (i-dimensional probability measures B a as 

S(ju) = lim / <j) n {dv)h a (v) , 

n — >oo * — • / y2 



/i((e , • . .,€„)) = {t\E Co . ..E tn (j) 
with cr, r e (R d )+ 
Z? ff = {^e(M d )+|<^| ( 7) = l} 
^a(^) = ~{u\E a a) \og{v\E a a) 

<t>n{dv)= ^2 M(eo, ••-,£«)) 



e ,...,e„eif 



K(«o> ■■■.««)) 

If we define the linear transformation T M on functions 

on B a : (T^/)(i/) = Eaeif H^)/ ( ( ^ <t) ), one can 

show that 4> n {f) — (f'oiTJtf)- is a contraction map, 
so a fixed point argument can be used to show that <f) n 
converges to a unique measure <p that is invariant under 

Wnf) = m . 

This measure allows us then to calculate the mean en- 
tropy 



aEL 



S(M) = E / <rW>» 



(7) 



Our goal in the remaining part of the article is to trans- 
late the switching depolarizing channel into the setting 
of algebraic measures and to try and find the invariant 
measure that allows us to calculate the mean entropy. 



V. ALGEBRAIC MEASURE OF THE CHANNEL 

The relationship between the hidden Markov measure, 
say /i' on AT Z , and the underlying Markov measure v 
with the Markov property on L z is through a 'tracing' 
function $ : L — > K, as is shown in Eq. (J5]). 

The underlying Markov process for the overall quan- 
tum channel has a four state configuration space cor- 
responding to channel selection and error occurrence: 
K = {(0, 0), (0, 1), (1, 0), (1, 1)}. The first index indicates 
which depolarizing channel has been chosen and the sec- 
ond indicates whether a bit flip occurred. The elements 
of the transition matrix, E, for this process are then given 
by 



(8) 



the probability of going from to is given by 

the switching probability qui from channel i to i', multi- 
plied by the probability x\, that channel %' produces the 
error-occurrence f. 

The function that produces the correct hidden Markov 
process is then given by 
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This function reflects the fact that we are unaware of the 
choice of channel that has been made. The only effect 
that is visible from the outside is whether or not an in- 
put qubit has been flipped. Thus, $ has to 'trace out' the 
choice of channel. $ maps into the two-state error con- 
figuration space containing 'no flip' and 'flip': L — {0, 1} 

Using the fact that the matrices Euj^ defining the al- 
gebraic measure of a Markov process ((Sec. IIV Al Pg. [3]), 
a — {hi) £ K) have only one non-zero row and Eq. ©, 
we get the matrices F and F\ that define the algebraic 
measure corresponding to // . The matrix corresponding 
to 0, the first element of L is given by 



F 



E 



(i,fc) 



E^« 



(»,0) 



(i,fc), *'((«, fe))=0 



'9ooz8 qoo%l Qoix° qoix\ s 



















and similarly for 1, the second element of L. 

The hidden Markov process then gives us almost the 
same probabilities as the eigenvalues in Eq. 



p{(h 



,kn)) = (T\F kl ...F fc „l) 



E 



Til qili2 



Note that according to our discussion in Section HVl the 
vector r is the stationary distribution of the full matrix 
E. Using Eq. (jSJ), one can see that the invariant distri- 
bution t is in fact T^k) = Ti^i? so the probabilities of 
the hidden Markov process coincide with the eigenvalues 
inEq. ©. 

Having constructed the correct algebraic measure, we 
can determine T M explicitly and use it to greatly simplify 
the corresponding invariant measure <p. 

The expression for T„, as can be found in is 



'AFal) 



where v is any 4-dimensional vector such that = 1 

and / is a continuous real-valued function on the set of 
such vectors. For the case of our hidden Markov measure, 
the form of this transformation can be greatly simplified. 
Due to the stochasticity of the matrix E, we have the 
following: 



Fo\l) = 



^ and Fi|l) = ^ 



If we furthermore denote the four row vectors of E by 
Ai? A2? A3 and A<4j we can write 



F*v 



!/iAi + v^fe and F*P = v 2 fi 2 + Vifai 



On top of this, /ii = 11% and /13 = /U4, so the total form 
of the transformation becomes 



(T M /)(z>)=(^i+^)/ 



^lAl + ^3 



V\ + v z 

+ va^a 
v 2 + va 



From this form of the transformation, we can already 
greatly restrict the support of <fi. Our claim is that the 
support of 4> is restricted to the set of convex combina- 
tions of fix and 113 

supp(</>) c {a/ii + (1 - a)A3 I a G [0, 1]} . 

To show this, let's suppose that i> € supp(</>) and v 
S := {a/ii + (1 — a)/i3 a £ [0, 1]}. Take Co a function on 
B a such that (p(s) — for all s 6 S and C,i>{P) ^ 0, then 



= <t>{T»Co) 

4>(dv) {v\ + 1/3)1,,. 

(l/ 2 + ^t)C 



0(dI/)T M (G>(IO) 

V\ + f3 

!/ 2 Al + v aH n 



z/ 2 + 1/4 



However, this integral is equal to zero, since the argu- 
ments to Cc run over the set S. 
Therefore, we have for / £ C(B), 



dX{a)f(afi 1 + (1 - a)A3) , 



(9) 



with A a measure on [0, 1]. 

Now let us look at tj> acting on the transformed /: 



Z/1A1 + ^3A*3 



0(T p /) = / <j){dv) (!/! + V 3 )f 

+ iy-2, + va 



v\ + 1/3 

w/ ^2Ml + ^3' 
\ J/o 



Z/2 + ^4 



,w d,. . . w/'Ma.l/il +Ma,3M3\ 

rfA(a) (/x a ,l + Ma,3j / — : — ; 

L V Ma,l+Ma.3 / 



+ (Ma,2 + Ma,4j/ ( I — ; ) 

V U„ 9 + U„ 4 / 



where 



Ma, 2 + Aa,4 



Aa = aAi + (1 - a)A3 



(10) 



By invariance (Sec. HVBi Pg. [3]), we can equate Eq. 
© and the above Eq. (fit)]) to discover an invariance 
concerning A. We thus arrive at the following symmetry 
of A: 

X = T[X]=a^c 1 {a)X[f 1 (a)]+c 2 {a)X[f 2 {a)] . 

The two functions f\ and f 2 are relatively simple shrink 
functions about two separate points in the domain [0,1], 
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that shrink the [0, 1] domain into two (possibly overlap- 
ping) sub-intervals of [0, 1]. 

We can turn this analytic symmetry into a cyclic def- 
inition or iterative procedure to generate A up to some 
approximation A n . 

Ki+i = T(X n ) . 

We still have not defined Ao, but taking a look the C 
iterative procedure, we see that there exist fixed points 
of the two shrink functions, call them a\ and <Z2, 

ai = /i(oi) 02^/2(02) ai,a 2 <G [0, 1] . 

With this observation the idea is to begin the iteration 
procedure with two Dirac delta's at these fixed points, 

M{a) = ^{a - a x ) + ^5{a - a 2 ) ■ 

Note that J Xo(a)da = 1, as a measure should be. Since 
there is unique convergence then the initial weightings 
should not matter [1]. 

To see that this is a good starting point and to get 
further insight into the support of A, it can be seen that 
the support will grow, but most importantly, once a point 
is within the support of A m it remains there for all n > m. 
So if the procedure is taken to infinity the support is fixed 
and countably infinite. Thus, we arrive at the following 
expression for the full support, 



and difference: 



a + d, x± 



d. 




FIG. 1: Capacity for maximally different sub-channels in- 
creases with memory 



c* 

1 



0.54 



supp(A) = {a e [0, l]:3ne N, 3t L e {0, l}Vi e [l,n] 
fk„ ° fk n -i • ■ ■ hi (01 or a 2) = a} . 

We use this iterative procedure to generate A„ and 
then use it in Eq. (0) to approximate the measure. The 
entropy in Eq. ((7]) can then be calculated and finally 
we use the entropy to calculate the capacity through Eq. 
d3]). It is the capacity and its dependence on memory 
that we are interested in. 



VI. RESULTS 

In constructing our channel we defined certain param- 
eters. It is useful to introduce a new set of suggestive 
parameters in terms of the old and also to reduce their 
number by making some assumptions. Firstly, we assume 
that the sub-channels switch symmetrically, that is, the 
probabilities of reuse are the same for both sub-channels. 
This makes the Markov matrix doubly stochastic and 
allows us to use its non-one eigenvalue as a useful char- 
acterizing parameter s. Thus, we set goo ~~ * (1 + s )/2 
and qia — > (1 — s)/2. The domain of s is (—1,1), with 
s = corresponding to no noise correlations. Secondly, 
we parametrize the error probabilities by their average 
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The main result is that the capacity increases with 
stronger noise-correlations. This manifests itself in two 



0.5 0.75 1 

FIG. 2: Capacity versus the average no-error probability a 



ways. Firstly, if we make the switching more correlated 
(s away from 0) the capacity increases and secondly, if we 
increase the difference between the two sub-channels the 
capacity also increases. Similar results have been found 
for the quantum capacity of the dephasing channel with 
Markovian memory 

In Figure [TJ c? is set to the maximum possible value 
while keeping an average of a (d — min[a — 1/3, 1 — a]). 
Remember that both a — d and a + d have to lie in the 
[1/3, 1] interval for the two sub-channels to be completely 
positive. The capacity is plotted against varying a and 
s. We can see that the capacity increases as the noise- 
correlation (s) gets stronger. When a — 2/3, d attains 
its maximum (1/3) and the effect of increasing s on the 
capacity is greatest. Another interesting observation is 
the case when the two sub-channels average to the max- 
imally mixing channel (a = 1/2, which ignoring memory, 
has zero capacity), taking into account memory effects 
there is a non-zero capacity. 

To better illustrate the last point and to further ex- 
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FIG. 3: Capacity versus the memory parameter s using many 
iterations and including full Markov calculation 



plore the relationship between the capacity of the mem- 
ory channel and its sub-channels, we plot in Figure 
slices of Figure [T] of fixed s together with plots of the 
underlying sub-channel capacities. 

Thus, in the 'Avg Capacity' curve of Figure [21 we see 
the edge of Figure[T](for fixed s = 1, equivalently s = —1, 
not actually attained), which corresponds to the average 
of the capacities of the sub-channels. The sub-channels' 
separate capacities are plotted in curves labelled 'Low 
Noise Sub' and 'Noisier Sub'. They are chosen to have 
maximum allowed separation for each point as a varies 
(and thus the artificial discontinuities). In a real world 
example, this separation parameter is fixed by the chan- 
nel and the sub-channels and their capacities would not 
be accessible. The capacity of the average channel, la- 
belled 'Avg Channel', corresponds to a slice of fixed s = 
(the center of Figure [1]), since a no- memory /non-biased 
Markov walk factors into a tensor product of the average 
of the sub-channels, which is thus equivalent to just one 
depolarizing channel with the average error probability. 
The curve, 'With Memory', is a smooth intermediary be- 
tween the 'Avg Channel' and 'Avg Capacity' and is an 
example slice of Figure Q] for < s = | < 1 , which il- 
lustrates how taking memory into account improves the 
capacity. Of course, again, in a real world example this 
parameter is specified by the channel. The smooth trans- 
formation is not straightforward nor linear, which can be 
seen in the way Figure [T] curves for varying s. 

To see the last point more clearly and also to indicate 
the convergence of the iteration procedure we next plot 
a slice of Figure [T] for fixed a. In Figure [3] we plot the 
regularized capacity against s with the following fixed 
parameters: a = 3, d = 3. 

We can see that the capacity increases as the noise- 
correlation gets stronger. The blue dots are calculated 
using a simplified (s = 1) full Markov walk calculation 
(1000 steps) which doesn't suffer from the usual expo- 
nential blow-up. The horizontal green line is the output 
entropy for s — 0, which is corresponds to no correla- 



tions and is equivalent to having only one depolarizing 
channel. 



A. Non-Forgetful Limit 

To complete the discussion concerning correlations we 
need to look at the two extreme cases: s = 1, corre- 
sponding to the case where a sub-channel is selected and 
used for every channel use afterwards, and s = — 1, cor- 
responding to the case where the choice of sub-channel is 
Hipped with every channel use. Therefore, in construct- 
ing the overall channel and taking into account the initial 
random channel selection, we just have the mixing of two 
n-use channels. Specifically, in the s — 1 case, we have 
the mixing of the two n-fold tensor products of the two 
sub-channels separately, and in the s = — 1 case, we have 
the mixing of two n-use channels where each determinis- 
tically alternates between the sub-channels but starting 
with a different sub-channel. 

Both these extreme cases are non-forgetful since the 
initial sub-channel selection (the initial noise) is 'remem- 
bered' and the forgetful Holevo capacity theorem no 
longer applies (the Markov selection matrix is periodic 
in the s = — 1 case and reducible in the s = 1 case). 
While our forgetful channel approach breaks down there 
are alternate theoretical frameworks that do actually cap- 
ture these extreme cases. For s = — 1 the capacity can 
be calculated using and agrees with the limit of the 
forgetful approach, the capacity is the average capacity 
of the two sub-channels separately. However, for s = 1 
case there is a discontinuity and the capacity sudd enly 
drops to the minimum capacity of the sub-channels [lCj . 

The intuition is that in the s = — 1 case, the deter- 
ministic flip can be used to determine 'on-the-fly' which 
sub-channel is being used and then it is the same as using 
the two channels separately each half the time, so the ca- 
pacity must be the average capacity. For the s = 1 case 
once you have the poorer channel you are stuck with it 
forever and so because of the mixture you can only guar- 
antee the lower capacity. 



VII. CONCLUSION 

We have constructed a simple forgetful noise-memory 
quantum channel. The noise-correlation is a function of 
the underlying hidden Markov process. This setup al- 
lowed us to construct a corresponding algebraic measure. 
We used the measure in an algebraic asymptotic entropy 
expression. Without this, the entropy would be very dif- 
ficult to compute, involving exponentially many paths in 
configuration space. 

We studied the effects that the noise correlations had 
on the classical capacity and discovered that the capac- 
ity increases with stronger correlations. This is sensible 
because the correlations can be used to combat the noise 
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when coding information. We have arrived at the under- 
standing that stronger correlations increases the capacity 
from that of the average channel to the average capac- 
ity of the sub-channels with very interesting limiting be- 
haviour. 

Further work includes using other approximation tech- 
niques, arriving at a full analytic expression of the ca- 
pacity and looking at other similarly constructed chan- 
nels. We are also confident and hopeful that the hid- 
den Markov technique could be successfully employed in 
other contexts. 
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